Expert tumor annotations and radiomics for locally advanced breast cancer in DCE-MRI for ACRIN 6657/I-SPY1

Breast cancer is one of the most pervasive forms of cancer and its inherent intra- and inter-tumor heterogeneity contributes towards its poor prognosis. Multiple studies have reported results from either private institutional data or publicly available datasets. However, current public datasets are limited in terms of having consistency in: a) data quality, b) quality of expert annotation of pathology, and c) availability of baseline results from computational algorithms. To address these limitations, here we propose the enhancement of the I-SPY1 data collection, with uniformly curated data, tumor annotations, and quantitative imaging features. Specifically, the proposed dataset includes a) uniformly processed scans that are harmonized to match intensity and spatial characteristics, facilitating immediate use in computational studies, b) computationally-generated and manually-revised expert annotations of tumor regions, as well as c) a comprehensive set of quantitative imaging (also known as radiomic) features corresponding to the tumor regions. This collection describes our contribution towards repeatable, reproducible, and comparative quantitative studies leading to new predictive, prognostic, and diagnostic assessments.


Background & Summary
The spatial manifestation of inter-and intra-tumor heterogeneity in breast cancer is well established 1,2 . Current breast cancer diagnosis and subsequent disease management primarily occurs on the basis of histopathologic assessment and biomarkers, which are derived from the sampled tissue. Utilization of biopsies and conventional biomarkers cannot fully capture the intra-tumor heterogeneity, as they are limited by the tissue sampling error, leading to over-or under-treatment. As such, there is a clinical need to characterize the intra-tumor heterogeneity to better understand this disease and its progression mechanisms.
The use of magnetic resonance imaging (MRI) in breast cancer screening, diagnosis, and treatment management, allows for the non-invasive and longitudinal sampling of disease burden 3,4 . Beyond the conventional and qualitative uses of MRI in breast cancer disease management, the field of radiomics, broadly defined as the extraction of high-throughput visual and sub-visual cues derived from medical imaging [5][6][7] , has allowed for a quantitative characterization and assessment of the breast tumor disease burden. This has led to the development of prognostic and predictive radiomic biomarkers that capture breast intra-tumor heterogeneity, promoting personalized clinical decision making 8 .
Clinical and computational studies analyzing the radiologic presentations of breast tumor disease burden require ample and diverse data to ensure robust characterization. Publicly available datasets, such as those hosted through The Cancer Imaging Archive (TCIA www.cancerimagingarchive.net) 9 , created by the National www.nature.com/scientificdata www.nature.com/scientificdata/ Cancer Institute (NCI) of the National Institutes of Health (NIH), provide large study cohorts for meaningful research development. Furthermore, such datasets 10-13 allow for study reproducibility and analyses comparisons across varying institutions, promoting increasingly robust conclusions. However, publicly available radiographic scans require accompanying expertly annotated ground truth tumor annotations to ensure accurate study comparisons and reproducible analyses. Furthermore, any computational analyses, including radiomics-based pipelines, require standardized image normalization and feature parameter selections for consistent analyses 6,7,14-16 . To address this limitation, this manuscript provides the 'I-SPY1-Tumor-SEG-Radiomics' collection, which extends the current TCIA collection 'I-SPY1' (https://wiki.cancerimagingarchive.net/display/Public/I-SPY1) 17,18 , with segmentations labels and radiomic features panel for the ACRIN 6657/I-SPY1 TRIAL cohort. The latter contains dynamic contrast enhanced (DCE) MRI images of women diagnosed with locally advanced breast cancer who underwent longitudinal neoadjuvant chemotherapy 17,18 . The primary goal is to allow standardized expert image annotations and radiomic features for researchers to conduct reproducible analyses. To this end, annotations and radiomic features for the baseline (pre-treatment) images of n = 163 women have been provided. Based on the analyses that needs to be performed, the selected cohort includes women with baseline (T1) DCE-MRI with at least two post-contrast images for future studies wishing to explore dynamic assessments of breast tumor behavior and treatment response prediction. For each patient visit, three MRI scans are provided over the duration of a single contrast administration: a pre-contrast image, and two post-contrast images. All provided images are pre-operative and pre-treatment. Two sets of annotated labels are provided: i) structural tumor volume (STV) segmentations assessed by an expert board-certified breast radiologist, and ii) functional tumor volume (FTV) segmentations, as described in prior studies 18,19 . While FTV segmentations can provide an assessment of tumor vascularity and perfusion, they are limited in describing the entire structural tumor burden as they only account for voxels of a region of interest (ROI) above a specific intensity threshold. In contrast, the provided STV segmentations annotate the entire structural region (i.e., the whole extent) of the primary lesion. The STV segmentations have been used in prior studies in which radiomic features extracted from the STV region resulted in improved prognostic performance than FTV values 20 . Preliminary evaluation of radiomic features extracted from STV defined primary lesion volumes has demonstrated improved prognostic performance over established clinical covariates 21 .
The availability of annotations characterizing the functional active regions around the lesion's ROI, the entire primary lesion structure, and the computed radiomic features can enable for the development of prognostic and predictive biomarkers characterizing breast tumor heterogeneity through the direct utilization of the TCIA ACRIN 6657/I-SPY1 TRIAL data potential in clinical and computational studies, but importantly can contribute to repeatable, reproducible, and comparative quantitative studies enabling direct utilization of the TCIA I-SPY collection.  18 . The pre-operative DCE-MRI images of 222 women were publicly available via The Cancer Imaging Archive (TCIA) 9 . From this TCIA set, 15 women were excluded for our present study, due to incomplete DCE acquisition scans. A subsequent  www.nature.com/scientificdata www.nature.com/scientificdata/ 44 women were also excluded due to either incomplete histopathologic data or recurrence free survival (RFS) outcome, or missing pre-treatment DCE-MRI scans. This resulted in the inclusion of n = 163 women for this study, for whom at least two post-contrast scans from the baseline pre-treatment DCE-MRI scans were available. Women underwent neoadjuvant chemotherapy with an anthracycline-cyclophosphamide regimen alone or followed by taxane. All women underwent longitudinal DCE-MRI imaging on a 1.5 T field-strength system. Distributions of patient histopathologic characteristics and image scanner manufacturer details can be found in Tables 1 and 2. An exemplary illustration showing the spatial intratumor heterogeneity is shown in Fig. 1. The complete clinical metadata is available in the Supplementary Table. Preprocessing. The preprocessing procedures involved in preparing the data for further analyses were conducted using the Cancer Imaging Phenomics Toolkit (CaPTk) [22][23][24] , and they are outlined as follows:

Methods
1. Image format conversion: For each patient, baseline images were converted to the Neuroimaging Informatics Technology Initiative (NIfTI) 25 file format from the publicly available DICOM scans. This format does not include any identifiable information as the DICOM headers hold, and only preserves the actual imaging information and the necessary information to define the data in the physical coordinates. 2. Bias Field Correction: All the converted NIfTI images were bias corrected to rectify any non-uniformity associated with the magnetic field of the MRI scanner 26,27 . 3. Data harmonization: This step is required to ensure consistency in the entire dataset as described below.
(a). Resampling: The raw I-SPY images have different voxel resolutions, preventing cohesive analysis across the entire dataset. To mitigate this, all the images were resampled to the standard 1mm 3 isotropic resolution to ensure harmonized processing for computational algorithms. This resolution is chosen because this resizes all the images to a size which can fit in the GPU memory (more details will be explained later) (b). Z-Scoring: After the images are resampled, we Z-score the images using instance level (considering all timepoints of the given patient rather than entire dataset) statistics of mean and variance. Z-scoring is a widely accepted method from extended observations [28][29][30][31] , that normalizing every single multi-timepoint scan (i.e., instance-level normalization) to zero mean and a unit variance helps to improve algorithmic generalizability and to preserve the relative intensity differences between the pre-and post-contrast excitation scans.

DCE-MRI NIfTI volumes.
Three volumes have been provided for each patient from the pre-operative, pre-treatment visit. These images include the pre-contrast administration MRI scan (0000), first post-contrast image (0001), and second post-contrast image (0002).

Expert tumor annotations.
From the NIfTI images, the functional tumor volume (FTV) segmentation was identified within the region of interest (ROI), provided through TCIA, from the signal enhancement ratio image, as previously described 18,32 . In order to generate the structural tumor volume (STV) segmentations, voxels outside of the largest contiguous volume region and voxels greater than 2 cm away from the largest contiguous volume region, within the FTV, were manually removed. Our expert board-certified breast radiologist then identified the primary lesions in each of the n = 163 baseline DCE-MRI images using the manually cleaned, FTV segmentation as a guide. The first-post contrast image for each case was used by the radiologist to delineate the entire 3-D primary tumor segmentation for each patient. Satellite lesions were not considered in the primary tumor segmentations. ITK-SNAP (www.itksnap.org) 33 was utilized to perform the manual delineations.
Computationally-generated annotations. A 3D Convolutional Neural Network based on U-Net 34 , with residual connections 35 , was trained on all the preprocessed 3 timepoints to perform automated segmentations of the STV and the code has been made available for reproducibility. The models are trained using the Multi-class Dice 36 Loss function 37 with on-the-fly data augmentation techniques such as ghosting, blur, and gaussian noise applied in a random manner with a given probability for each type of augmentation 38 . All the experiments are done using nested k-fold cross validation and the median Dice score across the holdout folds is 0.74. An initial learning rate of 0.01 is used, which is varied in a linear triangular fashion having a minimum learning rate of 10 −3 times the initial learning rate. We use the Stochastic Gradient Descent optimizer to update weights of our network.

Data Records
We are using the data 17 published through the ACRIN 6657/I-SPY1 TRIAL study 18 . Specifically, we selected baseline subjects for whom at least two pre-operative post-contrast scans were available. The raw and generated data, which includes the preprocessed images in isotropic resolution of 1mm 3 , the expert and computationally-generated annotations, and the extracted radiomic features, have been made available through TCIA's Analysis Results Directory www.cancerimagingarchive.net/tcia-analysis-results/ using https://doi. org/10.7937/TCIA.XC7A-QT20 44 . The computationally generated annotations can stand as a benchmark for improving segmentation algorithms related to this data in future computational studies.

Technical Validation
Data collection. The dataset was directly downloaded from TCIA and quantitatively analyzed to ensure all images have a defined coordinate system and contain non-zero pixel values. Two cases, 1183 and 1187, had white image artifacts outside of the breast region. While these artifacts do not affect intensity distributions within the anatomical breast or the corresponding lesion segmentations, they may cause difficulties in image visualization, and downstream analyses. These artifacts were present in images directly downloaded from TCIA (illustrated in Fig. 2). Additionally, qualitative assessment was performed to look for any visual data corruption.
Preprocessing. Each step of preprocessing was followed by manual qualitative assessment of the image to ensure data validity. In addition, quantitative assessment was performed following the data harmonization step to ensure that the entire dataset had the same parametric definition (i.e., same resolution and pixel intensity distribution). www.nature.com/scientificdata www.nature.com/scientificdata/ Expert tumor annotations. The expert annotated STV segmentations were qualitatively assessed, manually edited and approved by a board certified, fellowship-trained breast radiologist.
Computationally-generated annotations. The FTV annotations were quantitatively compared with the corresponding STV annotations using the Dice score in order to quantify the difference between the two annotations. Additionally, a qualitative analysis was performed for the best and worst performing cases (illustrated in Fig. 3).